Utilizing In-Memory Storage for MPI-IO
نویسندگان
چکیده
In contrast to disk or flash based storage solutions, throughput and latency of in-memory storage promises to be close to the best performance. Kove®’s XPD® offers pooled memory for cluster systems. However, the system does not expose access methods to treat the memory like a traditional parallel file system that offers POSIX or MPI-IO semantics. Contributions of this poster are: 1. Implementation of an MPI-IO (wrapper) driver for the XPD 2. Thorough performance evaluation of the XPD using IOR (MPIO) This MPI independent file driver enables highlevel I/O libraries (HDF5, NetCDF) to utilize the XPD’s pooled memory. APPROACH The developed MPI-IO file drivera is selectable at runtime via LD_PRELOAD. It checks the file name for the prefix “xpd:” and routes the accesses otherwise to the underlying MPI. Important MPI-IO functions for HDF5 and IOR are implemented. During the MPI_open/close the Infiniband connections to the XPD’s are established/destroyed. IOR is used for benchmarking performance and barriers between the phases (open, write, read, close) are used to synchronize the processes. The performance analysis varies the parameters: • Access granularity: 16 KiB, 100 KByteb, 1 MiB, 10 MiB • Processes-per-node (PPN): 1 to 12 • Nodes: 1 to 98 • Connections: 1 to 14 • Access pattern: sequential and randomc • File size: 20 GiB per connection d Performance metrics: M1. Time for open/close (used to test scalability of the connections) M2. Throughput read/write reported by IOR M3. Throughput read/write (computed based on the time for the read/write phase) Each configuration is run at least three times. Since the throughput reported by IOR includes overhead of open/close and initially that times turned out to be significant, those aspects have been investigated separately. A subset of measurements is run on the Lustre of DKRZ’s supercomputer Mistral. ahttp://github.com/JulianKunkel/XPD-MPIIO-driver bBase 10 has been used on purpose as this leads to unaligned access for file systems, i.e., 100 KByte = 105 Bytes. All other cases are base 2. cAs expected for a DRAM based storage system, they did not show significant differences. Thus, the poster only contains values for random I/O. dThe capacity of the XPD is shared among all users. OVERVIEW Performance of all (7500) conducted runs: Fig. 1: Observed throughput computed based on the read/write phase (M3.)
منابع مشابه
Exploiting Shared Memory to Improve Parallel I/O Performance
We explore several methods utilizing system-wide shared memory to improve the performance of MPI-IO, particularly for noncontiguous file access. We introduce an abstraction called the datatype iterator that permits efficient, dynamic generation of (offset, length) pairs for a given MPI derived datatype. Combining datatype iterators with overlapped I/O and computation, we demonstrate how a share...
متن کاملMPI-IO In-Memory Storage with the Kove XPD
Many scientific applications are limited by the performance offered by parallel file systems. SSD based burst buffers provide significant better performance than HDD backed storage but at the expense of capacity. Clearly, achieving wire-speed of the interconnect and predictable low latency I/O is the holy grail of storage. Throughput and latency of in-memory storage promises to provide optimal ...
متن کاملMPI/IO on DAFS over VIA: Implementation and Performance Evaluation
In this paper, we describe an implementation of MPI-IO on top of the Direct Access File System (DAFS) standard. The implementation is realized by porting ROMIO on top of DAFS. We identify one of the main mismatches between MPI-IO and DAFS is memory management. Three different design alternatives for memory management are proposed, implemented, and evaluated. We find that memory management in th...
متن کاملE cient Implementation of the MPI-IO System for Distributed Memory Multiprocessors
My research concentrates on e cient implementation of parallel le systems. Its main focus are MPI-IO systems for distributed memorymultiprocessors. The target platform for the developed code is the NEC Cenju-3 supercomputer, however the sources are directly portable to any homogeneous MPP running the Mach 3.0 microkernel. An important feature of my implementation is that I/O nodes of the underl...
متن کاملMPI-IO: A Standard, Portable API for High-Performance Parallel I/O
MPI-IO, the I/O part of the MPI-2 standard, is a portable API for high-performance parallel I/O. It is speci cally designed to overcome the performance and portability limitations of the Unix-like APIs currently supported by most parallel le systems. We discuss the main features of MPI-IO and describe our MPI-IO implementation, ROMIO, which runs on most machines and le systems, including Linux ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016